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ABSTRACT 


The  Kalman  filter  sequentially  generates  the  minimum  variance  estimate  of 
the  state  of  a  linear  dynamic  system.  This  estimate  is  a  function  of  the 
covariance  parameters  of  the  dynamic  system  model  which  implies  that  these 
be  known  a  priori.  Unfortunately  some  or  all  these  covariance  parameters 
are  often  unknown  in  engineering  applications  of  the  Kalman  filter.  In  this 
report  the  maximum- likelihood  estimates  of  the  unknown  covariance  param¬ 
eters  of  a  time -discrete  nonstationary  linear  system  are  computed  from 
measurement  residuals  of  a  suboptimal  sequential  filter.  Results  for  non¬ 
stationary  linear  systems  are  useful  for  nonlinear  systems  because  most 
nonlinear  estimation  problems  are  solved  by  linearization  which  results  in 
linear  nonstationary  plant  and  measurement  models. 
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SECTION  I 


INTRODUCTION 


The  problem  of  estimating  the  states  of  dynamic  systems  with  unknown 
covariance  parameters  has  been  studied  by  several  investigators.  Bucy  and 
Follin  published  some  significant  results  for  linear  stationary  dynamic  sys¬ 
tems  in  1962  (Ref.  1]  not  long  after  Kalman's  paper  [Ref.  2]  appeared. 

Many  navigation  problems  are  nonstationary  and  the  results  for  stationary 
systems  do  not  apply.  The  reason  is  very  basic:  time-averages  cannot  be 
interchanged  with  ensemble-averages  in  nonstationary  problems.  This 
interchange  is  the  crux  of  the  methods  used  in  stationary  problems.  Jazwinski 
[Ref.  3]  and  Abramson  [Ref.  4]  have  studied  nonstationary  systems. 

Ramon  Mehra  published  an  important  paper  on  identification  of  parameters 
in  linear  stationary  systems  (Ref.  5].  The  author's  results  reported  here 
are  direct  descendants  of  this  work  in  that  Mehra 's  formulation  is  general¬ 
ized  to  nonstationary  systems  while  the  variables  to  be  estimated  are 
restricted  to  constant  covariance  parameters. 

The  method  is  as  follows:  A  general  class  of  suboptimal  linear 
sequential  filters  is  defined  by  letting  the  gain  in  the  Kalman  filter  be  an 
arbitrary  (suboptimal)  residual  weighting  matrix.  Then  the  distribution  of 
the  measurement  residuals  generated  by  this  filter  is  derived.  The 
unknown  covariance  parameters  are  collected  into  a  vector  6,  and  Q  is  esti¬ 
mated  from  the  measurement  residuals  by  the  method  of  maximum  likelihood. 

A 

It  follows  immediately  that  the  likelihood  function  (and  hence  9)  depends  on 
the  sample  dispersion  matrix  of  the  measurement  residuals. 

The  likelihood  equations  are  too  complex  to  solve  for  the  estimate 
directly,  so  approximate  solutions  are  outlined.  In  a  special  case,  closed 
form,  minimum -me  an -square  unbiased  estimates  can  be  derived.  The 
example  of  a  satellite  attitude  determination  problem  is  discussed. 
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SECTION  II 


PLANT,  MEASUREMENTS,  AND  FILTER 


A  set  of  n-dimensional  vector -valued  random  variables  {x(i) :  i  =  1, 

.  . . ,  N}  are  generated  by  a  linear  stochastic  difference  equation  called  the 
plant 


X(i  +  1)  =  *<i)  x(i)  +  r(i)U(i)  (1) 

•where  {tl(i):  i  =  i,  ...»  N  -  1}  is  a  set  of  independent  vector -valued  random 
variables  each  of  which  is  distributed  Nj(0,  Q).^  The  initial  condition 
X(i)  for  Eq.  (1)  is  distributed  Nn(X,  M(l)).  The  values  of  the  random  vari¬ 
ables  {x(i):  i  =  1,  ....  N}  for  a  particular  realization  are  estimated  sequen¬ 
tially  from  a  series  of  m-dimensional  vector-valued  measurements  {z(i): 
i  =  1,  ...  N} which  are  related  to  the  states  by  N  linear  relations 

Z(i)  =  H(i)X(i)  +  w(i)  (2) 

where  {w(i):  i  =  1,  .  .  .  ,  N}  is  a  set  of  independent  vector -valued  random 
variables  each  of  which  is  distributed  Nm(0,  R).  The  matrices  M(l),  Q, 
and  R  are  positive  semidefinite.  In  many  engineering  problems,  elements 
of  M(l),  Q,  and  R  are  unknown.  Assume  the  random  variables  {u(i)  =  1, 

. .  .  ,  N  1}  and{w(i):  i  =  i,  ....  N}  and  X(i)  are  mutually  independent. 
Equation  (1)  is  often  the  result  of  linearizing  a  set  of  nonlinear  equations 
about  a  nominal  and  so  the  state  estimate  covariance  is  a  function  of  the 
nominal.  A  suboptimal  filter  for  the  states  is 
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The  notation  N|(0,  Q)  means  the  vector-valued  random  variable  U(i)  is 
normally  distributed  with  mean  0  and  covariance  Q.  The  dimension  of 
ll(i)  is  i. 


Preceding  page  blank 
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x'(i  +  1)  =  *(i)  X'(i)  +  0(i)  K(i)  V(i) 


(3) 


where 


V(i)  =  Z  (i)  -  H(i)  x'(i)  X'(i)  -  X  (4) 

and  where  the  known  sequence  of  gains  {K(i):  i  =  i,  ....  N}  has  finite  ele¬ 
ments  but  is  otherwise  arbitrary.  The  estimate  x'(i)  is  the  (suboptimal) 
one-step  extrapolated  estimate  of  X(i)  based  on  the  measurements  {z(i): 
j  =  1,  . . . ,  i  -  l}.  To  simplify  the  algebra,  some  notation  is 
introduced.  Rewrite  Eq,  (3)  using  {z(i):  i  =  1,  ...,  N}  as  the  input. 

X'(i  +  1)  =  *(i)  (i  -  K(i)  H(i))  x'(i)  +  f(i)  K(i)  z(i)  (5) 


The  fundamental  matrix  of  Eq.  (5)  is 


®(i  +  j.  i) 


[T(i  +  j  -  i)  •••  T(i) 
I  T(i) 


j>* 
j  =  i 
j  =  0 


(6) 


where 


T(i)  =  p(i)  (i  -  K(i)  H(i)J  (7) 

The  vector -valued  random  variables  x'(i)  and  V(i)  are  linear  combinations 
of  U(i)  andW(i),  and  so  it  follows  that  X'(i)  and  V(i)  are  multidimensional 
gaussian  random  variables.  The  distributions  of  x'(i)  and  V(i)  are  derived 
in  Ref.  6. 


V(i)  -  Nm(0,  B(i))  B(i)  =  H(i)  M(i)  HT(i)  +  R 


(8) 
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(9) 


C(i  +  j,  i)  =  E  [v(i  +  j)  VT(i)] 

C(i  +  j,  i)  =  H(i  +  j)  9(i  +  j,  i  +  1)  0( i)  (M(i)  HT(i)  -  K(i)  B(i))  (10) 

X'(i)  ~Nn(x(i),  M{i)) 

M(i  +  1)  =  T(i)  M(i)  TT(i)  +  ^(i)  K(i)  RKT(i)  *T(i)  +  r(i)  QrT(i)  (11) 

The  initial  condition  for  Eq.  (11)  is  M(l). 

Collect  the  unknown  elements  of  M(l),  Q,  and  R  into  an  r-dimensional 
parameter  vector  0  and  denote  its  true  value  by  The  relationship  between 
6  and  the  residuals  {v(i):  i  =  1,  ....  N}  is  established  in  the  following  lemma. 

Lemma  2.  1 

If:  (II)  {K(i):  i  =  1,  ....  N}are  given. 

Then:  (Rl)  The  elements  of  C(i  +  j,  i)  and  B(i)  are  affine  functions 

of  the  elements  of  6. 

Proof:  The  closed  form  for  the  covariance  equation  [Eq.  (11)]  is 

M(i)  =  4>(i,  1)  M(l)  #T(i,  1) 

i-1 

+  ^  *(i  +  1.  j  +  1)  *(j)  K(j)  RKT(j)  0T(j)  *T(i  +  1.  j  +  1) 

W 

i-1 

+  ^  *(i  +  l.  j  +  1)  r(j)  QrT(j)  *T(i  +  l,  j  +  1)  (12) 

w 
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Substitute  Eq.  (12)  into  Eq.  (8).  It  follows  that  the  elements  of  B(i)  are 
affine  functions  of  the  elements  of  M(l),  R,  and  Q  and  hence  of  $.  The  proof 
for  C(i  +  j,  i)  follows  in  the  same  way. 


QED 

The  significance  of  this  lemma  is  that  the  partials  ^1)?  and 

+  '  "1  n 
..  J.».  }L  are  independent  of  $. 

n 

The  Kalman  filter  is  of  course  an  important  special  case  in  the  general 
class  of  linear  sequential  filters.  Two  important  statistical  properties  of 
the  Kalman  filter  are  summarized  in  the  following  lemma. 


Lemma  2.  2 


If: 

(11) 

R  is  positive  definite. 

(12) 

K(i)  =  M(i)  HT(i)  B_1(i) 

(13) 

Then: 

(Ri) 

Tr  [M(i))  is  minimal.  If  M(i)  is  positive  definite, 

|  M(i)|  is  minimal. 

(R2) 

C(i  +  j,  i)  =  0  for  all  j. 

(14) 

Proof:  See  Ref.  6. 

If  R  is  singular  then  the  pseudo  inverse  is  used  in  12  if  B(i)  is 

singular. 

R2  is  not  a  sufficient  condition  for  R1  [Ref.  6.] 
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SECTION  III 


ESTIMATION  OF  COVARIANCE  PARAMETERS 


The  measurement  residuals  from  a  suboptimal  filter  are  not  statisti¬ 
cally  independent  and  the  usual  methods  in  parameter  estimation  must  be 
extended  to  dependent  samples  in  order  to  estimate  9. 

Define  a  composite  vector  V(N)  of  the  measurement  residuals  {v (i)  r 
i  =  1,2  ....  N}  generated  by  the  suboptimal  sequential  filter.  V(N)  is  a 
p-dimensional  random  variable  distributed  1^(0,  Z(N)),  where  p^  Nm  and 
where  E(N)  is  the  following  composite  matrix: 


B(  1) 

C(l,2) 

C(l,  3) 

....  C(1,N) 

C(2, 1) 

B(2) 

C(2, 3) 

_  C(2,  N) 

C(3,  1) 

C(3, 2) 

B(3) 

_  C(3,  N) 

Z(N)  = 


^C<N,  1)  C(N,  2)  C(N,  3) _  B(N)  J  (15) 

If  £(N)  is  positive  definite,  the  probability  density  of  V(N)  is 


P(V(N)|$)  =  (nT^)"P  |Z(N)|‘1/2  exp  J-  j  VT(N)  E_1(N)  V(N)J  (16) 

The  maximum  likelihood  estimate  maximizes  p(V(N)  |0)  (or  equivalently 
log  p  (V(N)|0))  for  the  observed  value  of  V(N). 
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max  log  p(V(N)|$)  =  log  p(V(N>|$) 

V 


(17) 


Define 


JN(«)  =  log  p(V(N)|0) 


(18) 


JN(0)  =  -  2  log  |Z(N)|  -  j  Tr  |V  *(N)  V(N)  VT(N)]  (19) 

A  necessary  condition  for  9  to  be  a  relative  maximum  of  Jj^(0)  is 


VN<«>  =# 


Define  the  score  (a  classical  term)  of  0 


sN(0)  =  V^JN(«)  = 


(20) 


(21) 


and  the  following  notation  for  the  partial  derivatives  of  £(N), 


W(N;i) 


A  8S(N) 
®®i 


(22) 


By  Lemma  2.  1,  W(N;i)  is  independent  of  9  for  all  i.  Hence  it  follows  that 
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Tr  [s_1(N)  W(N;j)  (i  -  2_1(N)  V(N)  VT(N))]  =  0  (23) 

j  =  i*  •  •  • »  r 

In  general,  Eq.  (23)  must  be  solved  numerically.  The  average  score  A^(0) 
is  defined  by 


8JN<*>  I 

38  j  2 


an(0)  $  e[sn(0)] 

The  variance  of  the  score  is  Fisher's  information  matrix  «dtj^(0)  and  is  given 
by 


*V*)  =  e[sn(«)  sjj  («)] 


The  expectations  which  define  (0)  and  cj^(9)  are  taken  over  the  probability 
density  function  p(V(N)|0g). 

The  Cramer -Rao  inequality  is  a  lower  bound  on  the  mean  square  error 
of  an  arbitrary  estimator^  (Ref.  7J.  If  9  is  an  unbiased  estimator  of  9, 
then 


cov[d]  a  (0O)  (24> 

A  A 

If  the  equality  in  Eq.  (24)  holds  for  a  particular  unbiased  estimate  9,  then0 
is  the  minimum -mean-square  unbiased  estimate  of  9q. 

The  information  matrix  is  a  measure  of  the  sensitivity  of  the  distribu¬ 
tion  of  the  residuals  with  respect  to  variations  in  the  unknown  parameter  9. 
At  one  extreme,  if  there  is  a  one-to-one  correspondence  between  each 
sequence  of  residuals  and  each  value  of  9,  then  the  sensitivity  or  information 
is  a  maximum.  At  the  other  extreme,  if  the  distribution  of  the  residuals  is 
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independent  of  6,  then  the  information  about  $  is  a  minimum.  An  expression 
for  the  information  contained  in  the  residuals  is  derived  next. 

Lemma  3.  1 

If:  (II)  E(N)  is  a  positive  definite  for  $  =  0^. 

Then:  (Ri)  A^)  =  0. 

(R2)  The  (i,  j)  element  of  equals 


-  E 


9ZJN(f) 


aei  ae. 


evaluated  at  B  =  Bn 


(R3)  E 


92Jn<0) 


86i  86. 


*  *  t  Tr 


[s'V)  W(n;j)  E” 1  (N)  W(N;i)]  (25) 


Proof:  Rl  and  R2  are  standard  results  which  follow  immediately 

from  the  definition  of  expectation  [Ref.  7j.  R3  follows 
from  Eq.  (19). 

As  an  example,  suppose  there  is  just  a  single  scalar  covariance 
parameter,  6.  From  Lemma  2.  1,  it  follows  that  2(N)  can  be  written 


2(N)  =  W(N)  6 


(26) 


A  closed  form  maximum  likelihood  estimate  of  6  can  be  obtained  in  this 
particular  case.  Drop  the  index  N  for  clarity.  From  Eq.  (23) 


(27) 
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But  since  £  =  W0  where  §  >  0,  it  follows  that 

W_1  VVT]  =  0  (28) 


6  =  ^  VT  W-1V  (29) 

The  following  lemma  is  an  application  of  the  Cramer-Rao  inequality  which 
shows  that  the  maximum  likelihood  estimate  in  this  case  is  also  the 
minimum -mean-square  unbiased  estimator  for  0. 

Lemma  3.2 


If: 

(ID 

0  >  0 

(12) 

W  is  positive  definite. 

Then: 

(Rl) 

0  =  ^  VT  W_1V 

(30) 

is  the  minimum -mean-square  unbiased  estimate  cf  0. 

Proof: 

(PI) 

From  Eq.  (29). 

ElS)  =  5T  Tr  [w"‘  we„]  =  e0 

(31) 

(P2) 

•% 

cov(e]  =  var(§]  -  8q 

(32) 

e[§2J  =  -L  e[(VT  W-V]  (33) 


S<0)  = 


k  Tr[ 


1 
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(34) 


E[02]  =  2  Trj^(W_1S)2]  +  (Tr  [(W_12)|)2 

E[e2]  =  |2Ne2  +  n2g2 

See  Ref.  6  for  the  details  of  this  last  step. 
covfl2]  =  |  e2  (36) 


(P3)  Fisher's  Information  is 

J(6)  =  i  Tr  [(S-lW)2]  =  f  £ 


(37) 


R1  follows  from  the  Cramer -Rao  inequality. 


In  general,  of  course,  the  maximum  likelihood  estimate  does  not  have  any 
optimal  properties  for  finite  N. 
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SECTION  IV 


APPROXIMATE  MAXIMUM  LIKELIHOOD  ESTIMATES 


In  general,  closed  form  solutions  of  Eq.  (23)  do  not  exist,  hence 

A 

Eq.  (23)  must  be  solved  numerically  for  0.  For  large  p,  storing  and 

inverting  E(N)  is  impractical.  A  feasible  approximate  solution  to  Eq.  (23) 

can  be  computed  if  the  elements  of  K(i)  are  reasonably  near  the  elements  of 
T  -1 

M(i)  H  (i)  B  (i)  in  value  and  the  eigenvalues  of  <b( i  +  j,  i)  decrease  to  zero 
for  increasing  j  for  all  i.  These  conditions  can  usually  be  fulfilled  in  prac¬ 
tice.  2(N)  may  then  be  approximated  by  a  band  matrix.  For  example, 
define  the  following  band  matrix  by  replacing  C(i  +  j,  i)  with  zeros  for 
j  =  3,  ....  N  -  i  i 


0 

0 

0 


1  =  1,  . . . 

,  N. 

B(l) 

C(l. 

2) 

0 

C(2,  1) 

B(2) 

C(2,  3) 

0 

C(3, 

2) 

B(3) 

22(N)  = 


B(N) 


(38) 


Band  matrices  retaining  higher  order  lag  correlations  can  be  defined  in  the 
same  way.  Special  algorithms  can  efficiently  compute  the  vector  E^^N) 
V(N)  and  the  matrix  2^(N)  W(N;i)  where  Efa(N)  is  a  band  matrix.  To  reduce 
the  computer  storage  requirements,  W(N;i)  may  be  approximated  by  a  band 
matrix  as  well.  An  iterative  technique  for  solving  Eq.  (23)  is  described  in 
Ref.  7. 
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SECTION  V 


EXAMPLE 


A  single  axis  analog  of  a  satellite  in  a  planar  orbit  is  examined  (see 
the  sketch  below).  The  problem  is  to  estimate  the  attitude  of  the  satellite 
with  respect  to  an  inertial  reference  line. 


A  star  mapper  attached  to  the  body  measures  0  at  time  t^. 

0m(t.)  =  0(t.)  +  w(i)  =  Y(t.)  -  »(t.)  +  w(i) 


(39) 


Preceding  page  blank 
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w(i)  is  the  measurement  error  and  Y  is  determined  from  star  charts.  A 
gyro  continuously  monitors  the  angular  rate  w 


w  =  w  +  n 
m 

(40) 

where  n  is  the  random  drift  rate.  The  attitude  error  is 

e,  where 

a  =  a  +  e  it  =  to 

o  o  m 

(41) 

e  =  <i-a_  =  w-  w _  =  n 

o  m 

(42) 

Integrate  e  between  star  sightings. 

e(ti  +  =  e(t.)  +  u(i) 

(43) 

A+l 

u'(i)  =  J  n(t)  dt 

*i 

(44) 

The  angle  u'(i)  is  the  drift  error  of  the  gyro  between  the 

star  sighting  time 

and  is  accurately  modeled  by  a  gaussian  random  variable  with  the  following 

statistics: 

E[u'(i)]  =  0 

(45) 

E  lu'(i)  u'(j)J  =  (tj  +  ,  -  t.)  q  6 

(46) 
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Subtract  the  nominal  attitude  predicted  from  the  angle  measured  by  the  star 
mapper. 


z (i)  ^  ‘  “o^)  “  Y(t.)  =  e(t.)  +  w(i)  (47) 

Assume  that  w(i)  is  a  gaussian  random  variable  with  the  following  statistics: 

E  [w(i)l  =  0  (48) 

E  [w(i)  w(j)]  =  r  6-j  (49) 

Define  the  following  variables: 

x(i)  =  e(t.)  (50) 

r<‘>  + .  -*i  <51> 

u{i>  -  tht  (52) 

Hence  Eqs.  (43)  and  (47)  become 

x(  i  +  1)  =  x(i)  +  r(i)  u(i)  (53) 

z(i)  =  x(i)  +  w(i)  (54) 

E  [x(l)J  =0  E  [u(i))  =  0  (55) 

E  [x2(l)]  =  m(i)  E  [u(i  +  j)  u(i)J  =  q  (56) 
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where 


8m(i  +  1)  _  t-2  8m(i) 

ae.  =  k  ae. 


+  k  6 
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+  r2(i) 


(63) 


imili  =  o 

aG|  u 


(64) 


N  may  range  up  to  25  sightings  for  a  single  orbit. 

The  degree  of  correlation  {v(i):  i  =  1,  ....  N}  depends  on  k.  For  insight, 
it  is  convenient  to  let  c(i)  =  1  for  the  present,  so  that  the  sequence 
{v(i):  i  =  1,  ....  N}  is  asymptotically  stationary.  The  correlation  coeffi¬ 
cients  of  {v(i):  i  =  1,  ....  N}  are  computed  from  Eq.  (58). 

_ 2  2 

m  =  k  m+k  r  +  q  m  =  lim  m(i)  (65) 

i  —  oo 


m 


.2 

k  r  +  q 

_ -> 

1  -  k" 


(66) 


c(j)  -  m  -  k1  *  kr  c(j)  =  lim  c(i  +  j,  i)  (67) 

i  —  co 

P(j)  .  (681 

2  2 

As  a  numerical  example,  let  q  =  45  arc  sec  and  r  =  90  arc  sec  .  Then,  one 
finds  from  Eq.  (66),  for  various  values  of  k, 
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1/3 

m  = 

99  arc 

c. 

sec 

1/? 

m  = 

90  arc 

2 

sec 

(69) 

3/4 

m  = 

102  arc 

> 

sec 

Notice  that  m  is  relatively  insensitive  to  variations  in  k  about  the  optimal. 
This  is  true  for  the  nonstationary  case  also.  The  correlogram  for  each  k 
is  graphed  below. 


A 

k  =  T 

O 

k  =  y 

□ 

When  k  =  1/3  (less  than  optimal),  the  filter  is  underdriven  and  suc¬ 
cessive  measurement  residuals  are  positively  correlated.  When  k  =  1/2 
(optimal),  the  residuals  are  uncorrelated  [Lemma  3.2).  When  k  =  3/4 
(greater  than  optimal),  the  filter  is  overdriven  and  successive  measurement 
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residuals  are  negatively  correlated.  In  both  suboptimal  filters  the 
correlation  is  essentially  zero  between  v(i  +  j)  and  v(i)  when  j  is  greater 
than  5.  Intuitively  this  means  that  there  is  very  little  information  about  r 
and  q  in  c(i  +  j  ,  i)  when  j  is  greater  than  5.  Hence  if  c(i)  is  near  1,  on  the 
average,  one  expects  the  band  matrix  approximation  to  be  very  good. 

The  speed  and  accuracy  of  the  band  matrix  approximation  for  this 
example  described  in  Section  IV  were  examined.  The  irregular  star 
sighting  times  were  simulated  by 


r2(i)  =  1  -  £  sin  (f  i)  (70) 

For  simplicity,  the  following  initial  conditions  were  used  for  all  cases: 

x'(l)  =  x(l)  =  0  m(l)  =  90  arc  sec2  (71) 

The  random  numbers  u(i)  and  w(i)  were  precomputed  and  adjusted  to  have 
zero  sample  means  and  sample  variances  equal  to  q  and  r  respectively. 

For  the  case  where  k  =  1/3,  the  following  relative  times  were  required 
to  compute  the  estimate  for  various  band  sizes: 

t  =  0.  24  b  =  1 

t  =  0.44  b  ^  3  (72) 

t  =  1.00  b  =  24 

The  relative  difference  between  the  time  required  for  inverting  a  full  matrix 

and  its  banded  approximation  increases  as  N  increases. 

The  following  tabulation  is  a  list  of  typical  estimates  of  r  (=  90  arc  sec 
2 

and  q  (=  45  arc  sec  )  for  a  single  run  for  various  values  of  b  and  k. 
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Pa  rameters 

Estimates  for  k  =  1/3 

Estimates  for  k  =  3/4 

b  =  1 

b  =  3 

b  =  24 

b  =  1 

b  =  3 

b  =  24 

q 

10.  91 

12.  50 

12.  50 

See 

Text 

14.  19 

15.  76 

r 

149. 08 

142. 36 

142. 84 

158. 09 

141.  96 

134. 59 

The  same  sequence  of  random  numbers  was  used  to  generate  the 
filter  residuals,  so  the  differences  in  the  estimates  are  due  to  the  different 
band  approximations  and  the  filter  gains  k.  For  b  =  24  (the  last  columns), 
the  estimate  is  the  exact  maximum  likelihood  estimate.  Notice  that  the 
approximate  estimate  for  b  =  3  differs  only  slightly  from  the  exact  estimate. 
The  q  estimate  computed  for  b  =  1  and  k  =  3/4  was  negative.  Clearly  a 
better  estimate  in  this  case  is  zero  since  q  is  non-negative. 

The  Cramer-Rao  bound  [Eq.  (24)]  was  computed  for  k  =  1/3. 


cov  [0]  > 


Since  the  elements  of  6  for  this  example  are  always  positive,  a  non-negative 
estimate  (which  is  biased)  would  have  a  mean-square  error  which  is  less 
than  that  of  an  efficient  unbiased  estimate.  This,  in  fact,  appears  to  be  the 
caae  for  this  example.  Ten  runs  were  made  and  the  sample  bias  and  mean- 
square  error  were  computed  for  k  =  1/3. 
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The  sample  mean-square  error  is  lower  than  the  Cramer-Rao  bound. 

Again,  this  can  occur  because  the  maximum  likelihood  estimates  are  biased. 
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SECTION  VI 


CONCLUSION  AND  APPLICATIONS 


The  maximum  likelihood  estimates  of  the  covariance  parameters  of 
a  linear  time-discrete  system  have  been  derived.  Approximations  are 
required,  in  general,  to  make  their  computation  tractable. 

One  application  of  this  work  is  "adaptive"  filtering  where  the  filter 
gain  is  made  a  function  of  the  estimated  covariance  parameters.  References 
6  and  8  contain  examples  of  adaptive  filters.  Another  application  is  off-line 
filtering,  such  as  post-flight  data  reduction.  The  increased  computation 
required  to  estimate  the  unknown  covariance  parameters  is  not  prohibitive 
in  off-line  applications. 


Preceding  page  blank 
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